NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CTSyn: A Foundational Model for Cross Tabular Data Generation

Lin, Xiaofeng; Xu, Chenheng; Yang, Matthew; Cheng, Guang (April 2025, International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
Rate-Optimal Rank Aggregation with Private Pairwise Rankings

https://doi.org/10.1080/01621459.2025.2484843

Xu, Shirong; Sun, Will Wei; Cheng, Guang (April 2025, Journal of the American Statistical Association)

Free, publicly-accessible full text available April 3, 2026
DEREC-SIMPRO: unlock Language Model benefits to advance Synthesis in Data Clean Room

Kwok, Thomas; Wang, ChiHua; Cheng, Guang (December 2024, ACM ICAIF GenAI and Synthetic Data Workshop)

Full Text Available
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

https://doi.org/10.1146/annurev-statistics-040522-013920

Suh, Namjoon; Cheng, Guang (November 2024, Annual Review of Statistics and Its Application)

In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.
more » « less
Full Text Available
Transfer Learning for Diffusion Models

Ouyang, Yidong; Xie, Liyan; Zha, Hongyuan; Cheng, Guang (December 2024, NeurIPS)

Full Text Available
Data Plagiarism Index: Characterizing the Privacy Risk of Data-Copying in Tabular Generative Models

Ward, Joshua; Wang, Chihua; Cheng, Guang (August 2024, KDD Workshop on GenAI Evaluation)

Full Text Available
Advancing Retail Data Science: Comprehensive Evaluation of Synthetic Data

Xia, Yu; Wang, Chihua; Marby, Joshua; Cheng, Guang (August 2024, KDD Workshop on GenAI Evaluation)

Full Text Available
Two-sided Competing Matching Recommendation MarketsWith Quota and Complementary Preferences Constraints

Li, Yuantong; Cheng, Guang; Dai, Xiaowu (July 2024, ICML)

Full Text Available
FairRR: Pre-Processing for Group Fairness through Randomized Response

Zeng, Xianli; Ward, Joshua; Cheng, Guang (May 2024, Artificial Intelligence and Statistics)

Full Text Available
Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

Xing, Yue; Lin, Xiaofeng; Song, Qifan; Xu, Yi; Zeng, Belinda; Cheng, Guang (May 2024, Artificial Intelligence and Statistics)

Full Text Available

« Prev Next »

Search for: All records